English Lexical Sample Task Description

نویسنده

  • Adam Kilgarriff
چکیده

The English lexical sample task (adjectives and nouns) for SENSEVAL 2 was set up according to the same principles as for SENSEVAL-1, as reported in (Kilgarriff and Rosenzweig, 2000). (Adjectives and nouns only, because the data preparation for the verbs lexical sample was undertaken alongside that for the English all-words task, and is reported in Palmer et al (this volume). All discussion below up to the Results section covers only adjectives and nouns.) 1 Lexical sample The lexicon was sampled to give a range of low, medium and high frequency words (see Table 1). These were all different words to the ones used in SENSEVAL 1. 2 Corpus choic~ For the most part, the British National Corpus (New edition) was used. (The new edition has the advantage that it is available worldwide , so all participants had the opportunity of obtaining it for system training.) Our goal was to match this source, containing British En-glish, with another, of American English. In the event, only limited quantities of corpus data for American English were available without copyright complications, so the lion's share of the data was from the BNC with a limited quantity from the Wall Street Journal. In accordance with standard SENSEVAL procedure , the goal was to have 75 + 15n + 6m instances for each lexical-sample word, where n is the number of senses the word has and m is the number of multiword expressions that the word is part of (both, of course, relative to a specific lexicon). In practice numbers varied slightly, as instances were deleted because they had the wrong part of speech or were otherwise unus-17 able. See Table 1 for actual numbers of senses, multiwords expressions and instances. 3 Lexicon choice Here lay the biggest contrast with the SENSEVAL-1 task, which had used Oxford University Press's experimental HECTOR lexicon. This time, in response to popular acclaim, WordNet was used. Since SENSEVAL was first mooted, in 1997, WordNet-or-not-WordNet has been a recurring theme. In favour was the argument that it was already very widely used, almost a de facto standard. The argument against concerned its sense distinctions. WordNet, like thesauruses but unlike standard dictionaries, is organised around groups of words of similar meanings (synsets), not around words (with their various meanings). This means that the priority for the lexicographer is building coherent synsets rather than the coherent analysis of the various meanings of a …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Relationship between Syntactic and Lexical Complexity in Speech Monologues of EFL Learners

: This study aims to explore the relationship between syntactic and lexical complexity and also the relationship between different aspects of lexical complexity. To this end, speech monologs of 35 Iranian high-intermediate learners of English on three different tasks (i.e. argumentation, description, and narration) were analyzed for correlations between one measure of sy...

متن کامل

Level of Grammatical Proficiency and Acquisition of Functional Projections: The case of Iranian learners of English language

Unlike Lexical Projections, Functional Projections (Extended Projections) are more of an ‘abstract’ in nature. Therefore, Functional Projections seem to be acquired later than Lexical Projections by the L2 learners. The present study investigates Iranian L2 learners’ acquisition of English Extended Projections taking into account their level of grammatical proficiency. Specifically, the aim is ...

متن کامل

SemEval-2007 Task 05: Multilingual Chinese-English Lexical Sample

The Multilingual Chinese-English lexical sample task at SemEval-2007 provides a framework to evaluate Chinese word sense disambiguation and to promote research. This paper reports on the task preparation and the results of six participants.

متن کامل

The Impact of Task Complexity along Single Task Dimension on EFL Iranian Learners' Written Production: Lexical complexity

Based on Robinson’s Cognition Hypothesis, this study explored the effects of task complexity on the lexical complexity of Iranian EFL students’ argumentative writing.This study was designed to explore the manipulation of cognitive task complexity along +/-single task dimension (a resource dispersing dimension in Robinson’s triadic framework) on Iranian EFL learners’ production in term of lexica...

متن کامل

USYD: WSD and Lexical Substitution using the Web1T corpus

This paper describes the University of Sydney’s WSD and Lexical Substitution systems for SemEval-2007. These systems are principally based on evaluating the substitutability of potential synonyms in the context of the target word. Substitutability is measured using Pointwise Mutual Information as obtained from the Web1T corpus. The WSD systems are supervised, while the Lexical Substitution syst...

متن کامل

The Duluth lexical sample systems in Senseval-3

Two systems from the University of Minnesota, Duluth participated in various SENSEVAL-3 lexical sample tasks. The supervised learning system is based on lexical features and bagged decision trees. It participated in lexical sample tasks for the English, Spanish, Catalan, Basque, Romanian and MultiLingual English-Hindi data. The unsupervised system uses measures of semantic relatedness to find t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001